133 research outputs found
Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion
Bird-eye-view (BEV) based methods have made great progress recently in
multi-view 3D detection task. Comparing with BEV based methods, sparse based
methods lag behind in performance, but still have lots of non-negligible
merits. To push sparse 3D detection further, in this work, we introduce a novel
method, named Sparse4D, which does the iterative refinement of anchor boxes via
sparsely sampling and fusing spatial-temporal features. (1) Sparse 4D Sampling:
for each 3D anchor, we assign multiple 4D keypoints, which are then projected
to multi-view/scale/timestamp image features to sample corresponding features;
(2) Hierarchy Feature Fusion: we hierarchically fuse sampled features of
different view/scale, different timestamp and different keypoints to generate
high-quality instance feature. In this way, Sparse4D can efficiently and
effectively achieve 3D detection without relying on dense view transformation
nor global attention, and is more friendly to edge devices deployment.
Furthermore, we introduce an instance-level depth reweight module to alleviate
the ill-posed issue in 3D-to-2D projection. In experiment, our method
outperforms all sparse based methods and most BEV based methods on detection
task in the nuScenes dataset
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
State-of-the-art convolutional neural networks are enormously costly in both
compute and memory, demanding massively parallel GPUs for execution. Such
networks strain the computational capabilities and energy available to embedded
and mobile processing platforms, restricting their use in many important
applications. In this paper, we push the boundaries of hardware-effective CNN
design by proposing BCNN with Separable Filters (BCNNw/SF), which applies
Singular Value Decomposition (SVD) on BCNN kernels to further reduce
computational and storage complexity. To enable its implementation, we provide
a closed form of the gradient over SVD to calculate the exact gradient with
respect to every binarized weight in backward propagation. We verify BCNNw/SF
on the MNIST, CIFAR-10, and SVHN datasets, and implement an accelerator for
CIFAR-10 on FPGA hardware. Our BCNNw/SF accelerator realizes memory savings of
17% and execution time reduction of 31.3% compared to BCNN with only minor
accuracy sacrifices.Comment: 9 pages, 6 figures, accepted for Embedded Vision Workshop (CVPRW
DynStatF: An Efficient Feature Fusion Strategy for LiDAR 3D Object Detection
Augmenting LiDAR input with multiple previous frames provides richer semantic
information and thus boosts performance in 3D object detection, However,
crowded point clouds in multi-frames can hurt the precise position information
due to the motion blur and inaccurate point projection. In this work, we
propose a novel feature fusion strategy, DynStaF (Dynamic-Static Fusion), which
enhances the rich semantic information provided by the multi-frame (dynamic
branch) with the accurate location information from the current single-frame
(static branch). To effectively extract and aggregate complimentary features,
DynStaF contains two modules, Neighborhood Cross Attention (NCA) and
Dynamic-Static Interaction (DSI), operating through a dual pathway
architecture. NCA takes the features in the static branch as queries and the
features in the dynamic branch as keys (values). When computing the attention,
we address the sparsity of point clouds and take only neighborhood positions
into consideration. NCA fuses two features at different feature map scales,
followed by DSI providing the comprehensive interaction. To analyze our
proposed strategy DynStaF, we conduct extensive experiments on the nuScenes
dataset. On the test set, DynStaF increases the performance of PointPillars in
NDS by a large margin from 57.7% to 61.6%. When combined with CenterPoint, our
framework achieves 61.0% mAP and 67.7% NDS, leading to state-of-the-art
performance without bells and whistles.Comment: Accepted to CVPR2023 Workshop on End-to-End Autonomous Drivin
DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
Text-driven image manipulation remains challenging in training or inference
flexibility. Conditional generative models depend heavily on expensive
annotated training data. Meanwhile, recent frameworks, which leverage
pre-trained vision-language models, are limited by either per text-prompt
optimization or inference-time hyper-parameters tuning. In this work, we
propose a novel framework named \textit{DeltaEdit} to address these problems.
Our key idea is to investigate and identify a space, namely delta image and
text space that has well-aligned distribution between CLIP visual feature
differences of two images and CLIP textual embedding differences of source and
target texts. Based on the CLIP delta space, the DeltaEdit network is designed
to map the CLIP visual features differences to the editing directions of
StyleGAN at training phase. Then, in inference phase, DeltaEdit predicts the
StyleGAN's editing directions from the differences of the CLIP textual
features. In this way, DeltaEdit is trained in a text-free manner. Once
trained, it can well generalize to various text prompts for zero-shot inference
without bells and whistles. Code is available at
https://github.com/Yueming6568/DeltaEdit.Comment: Accepted by CVPR2023. Code is available at
https://github.com/Yueming6568/DeltaEdi
A Unified Framework for Analyzing and Detecting Malicious Examples of DNN Models
Deep Neural Networks are well known to be vulnerable to adversarial attacks
and backdoor attacks, where minor modifications on the input can mislead the
models to give wrong results. Although defenses against adversarial attacks
have been widely studied, research on mitigating backdoor attacks is still at
an early stage. It is unknown whether there are any connections and common
characteristics between the defenses against these two attacks. In this paper,
we present a unified framework for detecting malicious examples and protecting
the inference results of Deep Learning models. This framework is based on our
observation that both adversarial examples and backdoor examples have anomalies
during the inference process, highly distinguishable from benign samples. As a
result, we repurpose and revise four existing adversarial defense methods for
detecting backdoor examples. Extensive evaluations indicate these approaches
provide reliable protection against backdoor attacks, with a higher accuracy
than detecting adversarial examples. These solutions also reveal the relations
of adversarial examples, backdoor examples and normal samples in model
sensitivity, activation space and feature space. This can enhance our
understanding about the inherent features of these two attacks, as well as the
defense opportunities
TPU as Cryptographic Accelerator
Polynomials defined on specific rings are heavily involved in various
cryptographic schemes, and the corresponding operations are usually the
computation bottleneck of the whole scheme.
We propose to utilize TPU, an emerging hardware designed for AI applications,
to speed up polynomial operations and convert TPU to a cryptographic
accelerator.
We also conduct preliminary evaluation and discuss the limitations of current
work and future plan
- …